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™nn SYSTEM AND COMPUTER PROGRAM PRODUCT FOR 
INFORMATION IN A TEXT STRING CLASS 



RELATED APPLICATIONS 



\ <= related to the subject matter 

t> NThe present invention is . rel * te ° Dnited sta tee 

£ thXfoUowing co^only assigned copendi ^ ^ 

patent V'f^^YSTEM, A^CC^UTER PROGRAM 
98-108) entitled "METHOD, SY ALTERNATIVE 
PRODUCT FoWlDING A USER INTERFACE WITH ^ 
M SPLAY LANGUAGE CHOICES" and f, "METHOD, 
no. 08/ Vf ^^ODUCT FOR CAPTURING LANGUAGE 

SYSTEM »^X™TfoImATION INTO A TEXT STRING 
TRANSLATION AND SOtaING ^ ^ ^ 

CLASS" and filed _ d _ gysTEM AND 

(Docket No. AT9 - 98 -"V n " S0RTING TEXT STRINGS" and filed 
COMPUTER PROGRAM PRODUCER SORT IN ^ ^ ^ 

SYSTEM AND^COMPUTER PROGRAM PRODUCT 

408) entitled "METHOD, ^ nT SPLAY STATES IN A 

FOR ROTATING THROUGH A SE^CE 0 -SPLAY ^ 
MULTI-FIELD TEXT CLASS IN A t No . AT9- 

1998: serial no.XOB/. 

filed ■ ' ov „ tem \hd COMPUTER PROGRAM 

98-409) entitled "METHOD, SYSTEM^ AN h^ti-FIELD 
PRODUCT FOR CONTROLLING THE GRAPHIC^ DISPLA^ ^ ^ 

TE XT STRING OWBCTS "and file ^V^^ ^ SYSTEM 

08/ _ (docket NO. ^ 98 * ' j^ING THE CONTENTS OF 

AND COMPUTER ^ ^XeCT" and filed 

ALL FIELDS IN A MULTI-FIELD TEXT STR * ^ ^ 

• 1998 ' cysTEM AND~"cOMPUTER PROGRAM PRODUCT 

411) entitled "METHOD, SYSTEM AND V STRIH0 

FOR DYNAMIC LANGUAGE SWITCHING IN A MULTI V 



0116AD-29939/59158.X 



AT9-98-160 



- 2 - 



t 



10 



is 



1=2 



25 



no. 08/ 



30 



OBJECT VIA MESSAGING" and filed ■ and serial 

° EJE V (Doclce t NO. AT9-98-578) entitled "METHOD, 

SYSTEM ANB ^COMPUTER PROGRAM PRODUCT FOR AUTOMATIC CHARACTER 

TRANSLITERATION IN A TEXT STRING OBJECT" and filed . , 

19»B. The cogent of the above-referenced applications is 
incorporated herVn by reference. 

BACKGROUND OP THE INVENTION 



1. Technical Field: 

The present invention relates in general to text 
strings in data processing systems and in particular to 
encapsulation of identification, waning or P™ation 
information in text strings. Still more particularly the 
present invention relates to a multi-field text string 
encapsulating identification, meaning, and pronunciation 
information utilizing different character sets. 

2. Description of the Related Art! 

Multinational companies often run information system 
(IS) networks which span multiple countries spread around 
the globe. To maximize the usefulness of such networks, 
operations within each locale tend to run in the 
language of the region, where possible, names of abstract 
1,1 J. in user applications are in the local language and 

WJ ^c MB ., aHnn citv , or human names 

match the local language, organization, cxty^ 

which the abstract objects represent. In the case J 
management software, often abstract objects would represent 
each of a global enterprise's local offices. 
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of such a global network may be 
Central management of sucn ay 

Len , , , ho) . raf , t obiect names utilize 

ai£f icult or i^' 1 "* *^ * Tape's underlying 
th e local language^ th 1 ^ ^ ^ 

character set. For ornc Arab ic- offices in 

5 objects would most naturally , be name n - ^ 

Eu ssia would name objects tU^ Cy ^ 

f„r offices in Japan, objects wouiu 
..t, and for offices ^ ^ enterpri3e . s 

Japanese A thege objects 

headguarter IS taff > head<T , ar ters located rn 

Japanese, or even recognize CyrrlUc characters 

S „i. is a logosyllabic or ideographic 

jj Japanese, for example, ^J^J presenting simple 

i| language which does not set wi th 

5 sounds, but instead has a very larg ^ 

5 symbols ...ideographs-, correspon « J ^ 

« rather than simple sounds • For ns , ^ 

If. U.t (Kanji for Daxly Use, *^ unfamiliar with the 

ic in 1*61 inches 194 symbol- User , 

S Kanji characters will have difficulty ^ 

. i 3>ichrac t obiect named in Japanese, 
<y particular abstract od D ab£ , trac t objects over the 

m difficulty even discussing such abstract or> 3 

difficulty e . and japanese-speaking 

telephone with an English ana y 

25 ' counterpart. 

or .„„ _elv seeing an ideograph may provide no 

A T th rcorre7t mining or pronunciation since, in 
clue as to the may ha ve multiple meanings or 

JaPane3e ;tr 0 :s S Fo f n -e, the character depicted in 
30 pronunciations. „ Spa ln"; the symbol 
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„ rin , ,„ n-., and the .'^ 
may be pronounced "suga no.- ^ ja. K» . ^ 
ya ,. This circu m stance rs based « par ^ ^ 

the Japanese language, a whxch the ^ uavea . Thus , 
adop ted fro. the Chinese language in „ on . 

i~ hhP "rin" symbol depicted m rxy 
for example, the * Chinese pronunciation 

Yom i, basically a si^at.on of the 
when the character was ported to Japan, w 
Kun-Yomi, a Japanese word assigned to the 
has the same meaning. 

irf be desirable, therefore, to provide a data 
It would be aesirciui . identification, 

1-^xt string encapsulating laentii- 

ZZTJE^Z*- - utiUzins dl££erent 

|i character sets. 



10 
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SUMMARY OP THE INVENTION 

It is another object of the present for 
f W svstem and computer program product for 
provide a method, system or onuciation 

. i= meaning or yj-^^ 

Z^Ti:^^* - 10Y ed by data process., 

systems . 

It is yet another object of the present invention to 
« a method system and computer program product for 

provrde a method y encapsulating 

implementing a multi fie pronuncia tion information 

identification, meaning, ana p 

utilizing different character sets. 

are achieved as is now described. 
The foregoing objects are ach loyed to 

A multi-field text string data *»^' p ^J ution 

encapsulating identification meaning ndp ^ ^ ^ 
and/or sorting information for langua9e in 

contains the characters ^J^Z -tin 
which the text string is entered, »h character s, 
characters, characters ^s^r Contains either 
or one or -e ideograp . X-c^ oi th e 

the same cnaracu«r D ~ — for a phonetic 

text string, such as syllabary character^ fo P ^ ^ 

spelling of the ! ^'^ 1 ^ Mr . as the first field 
field contains either the characters in 
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• • the language in which the text string was entered, 
string xn the contains informa tion about 

while the second and third tiei 

the waning and pronunciation of the ^«ng. 

• t-v,« first field are unrecognizable to « 
characters in the first . fl have more than one 

or when the characters m the first fi 

than one pronunciation, the concern.*. 

thlrdTelL allow the user to recognise the text 

of the text string. 

The above as well as additional objects, features and 
o f the present invention will ~~ ^rent 
the following detailed written descnpt.on. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of the 
invention are set forth in the appended claims. The 
invention « preferred mode of 

invention itself however, as well as a pref 
use , further objects and advantages thereof will best 
understood by reference to the following ^axled 
description of an illustrative embodiment when read xn 
conjunction with the accompanying drawings, wherexn: 

Figure 1 depicts a diagram of a data processing system 
Figure i^f nresent invention may 

in which a preferred embodiment of the present 

be implemented; 

Fig ur. 2 is a diagram of a multi-field text string 
class employed in encapsulating "—"^^f ^ frad 
phonetic spelling information in accordance wrth a prefe 
embodiment of the present invention; 

Fig ure , depicts a high level flowchart for a process 
of entering data into a multi-field text string , c as 
accordance with a preferred embodiment of the present 

invention; 

Fig ur« 4A-4B are portions of a user interface showing 
one application for character-mapped data entry into 

1 of a multi-field text string class in 

aocordanceTi^a preferred embodiment of the present 

invention; 

Flgu re 5 is a high level flowchart for a process of 
data entry in a logosyllabic language into a multi-field 
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tex t string ciass in accordance with a preferred e^odi.ent 
of the present invention; 

6G are illustrations of user interface 
Figure 6A-6G are il logoS yllabic 
displays for a process of da ^ accordance 
language into a nozlti-f xeld text St ^ 9 invention; and 
with a preferred embodiment of the present 

7A 7C are pictorial representations of known 

a jriS. - *- meaninss or 

pronunciations . 
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DETAILED — «- - - — 

.„ the figures, and in particular 
wlth reference now " the ^ ^ q£ , data 

with reference to Figure 1, embod iment of the 

processing system is depicted. Bata 

present invention may be P le , one of the 

processing system 100 may be ^ e J^ ailable £ro m 

Iptiva® models of ^^J^f Ration of ArmonK, New 
In ternational Business Machine C P ^ ^ 1M , 

york. Data processing system 100 ^ ^ ^ ^ 

which in the -^lary e^me^ ^ ^ ^ , system 

t „o (L2) cache 104, whic > - ~ daCa proces sing 

bus 106. m the exemplary conne cted to system 

sys tem ioo — ^ 

bus 106, receiving user in 
120. 

. 10S i s system memory 108 

R1 so connected to system bus ^ ^.^ 1W 

and input/output (I/O) bus bridge 11 . ^ 

couples I/O bus 112 to syste bu ^ , ^ ^ ^ 

tr an S forming data transactions from ^ ^ 

peripheral devices such as nonvo - g ^ e llt 

ma y be a ^^f;::^ ta/mouse, a trackball, or the 
which may include * c°n 
like, are connected to I/O 

.„,,-«., =hown in Figure 1 is provided 
Th e exemplary ^^"—^ the invention and 
solely for the purposes o exp ^ 
those skilled in the art will « * ^ For 

— - - ^^u. ioo - g -;-; a r:ro u 

"ret dirr/ad-only memory or di 9 ital 
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^ aa Vprs and numerous 

«*»» * — c t ^jirss- « 

other c^J^f^ scope oI the present invention. 
co be «tb» »* the Java imp iementation 

Data processing J*"; 3olely as example s for the 

; examples below are prov inte nded to imply 

pur poses of explanation and ^ ^ the art wil l 

ar chitectural limitations. Thos s may be 

, - scope of the mention. 

o a diagram of a multi-field text 
Referring to Figure t, _ 

i i 1-pration 

B string class employed in ^^^n "^ordance with a 

B and/or ^J^^^L^ U 

Ij preferred ° ^^ional computing environ- 

5 x fundamental proble ^x, mul ^ ^ languages 

D ments which need to display lates information in 

U u that a spoxen word generally encapsu 

M multiple aspects or ^ in e ^ When reduced 

meaning, from context £ or Mnipulation or 

^ W \ Vl Tn a^ta prore-ing ^tem, the word may lose some 

display in a data P iated me aning. Most 

attributes and much of the ^ 

im portan tl y for ata proce^si g J m ^ ^ ^ ^ ^ 

representation of a wor y the er 

translation or pronunciation of th. *° ^ ^ 
placement of a word within ■ i. ^ ^ ^ 

international String ("istring 
to address this problem. 

n , 202 is preferably a Java class similar to 
XString clas 202 ,s p ^ ^ ^ class 

the Java String class, wnic 
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of the String class should be preserved, 
functionality added and utilized only as needed^ XString 
class 202 is a datatype which captures some of the meaning 
of oL words which is normally lost when the wor is 
reduced to a visual representation. IString class 202 is 
Preferably utilized for all object names and system messages 
within a system. 

~-i= a o 902 structure includes three 
The IString class 20^ struma 

^ffprent strings for each name, message, data, or text 
oh ect a ^String 204, a sortString 206, and an altString 
2 08 BaseString 204 is the string within IString class 202 
208. BaseStr 9 interface display and may 

employed by default m the user 

contain any text, usually the orxgmal text entere y 
and lanquages which are ditlicuic uu 

Mnary val!e of baseString 204. Altstring 208 may be any 
t I bufshould conventionally be filled with a latin 

(sortString 206,. and a pronunc iation ^ 'a"™ 
for object names, system messages, and other data. 

„hen implemented in Java, a constructor for 
class 202 object may be composed of the following fields. 
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/** The base text String */ 
protected String baseString; 

/•• The related text String for proper collation */ 
protected String sortString; 

The related alternate text String (pronunciation key) •/ 
protected String altString; 

icn Tififi code: used for collation */ 
/** The source locale, as an ISO-31bb coae, 

protected String sourceLocale; 

/** The source language, as an ISO-639 code */ 
protected String sourceLanguage; 

or^nt Hpfined for EBCIDIC and case mapping */ 
/** The source variant detinea tui 

protected String sourceVariant; 

/•• The target locale, as an ISO-3166 code •/ 
protected String targetLocale; 

/** The target language, as an ISO-639 code */ 
protected String targetLanguage; 

/« The target variant defined for EBCIDIC and case mapping •/ 
protected String targetVariant; 

f «i- - are readily — 

a variety of sources on the Internet. 

Ta *le I illustrates how data within the IString data 
type 202 looks when represented as a taDie: 
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Table I 



object 202 where the contents 

locale may be: 



;.rr.»:.......»»»»-"- *•••«**"**" 

* <px/p> 

locale. </p> 





public IStringO { ctrinalV 
this.baseString = new String , 
this.sortString = new Str.ngO, 
this.altString = new Str.ngl); 
initl); 

} * ,^ IString class 202 datatype to be 

To allow objects of the IStrx g ^ ^ ^ t 

st ored in an Object Data. by Common Object Re^st Broker 
manipulation of IStrxng data by 
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. , (CORBA) applications, an Interface Definition 
Architecture (corbai app «.„.,,. 
Language (IDL, class should be defied. 

struct IString{ strjng 

string baseStnng; //rplated te xt String for collation 

string sortString; ££jd ^ g^ ^ ( unciatl0 n) 

string altStnnfl; re * * |e as an 1S0 -31 66 code 

string sourceLocale, source ^ |SQ 63g code 

string sourceLanguage; //source »ng ^ 9 

string sourceVariant; ^£2^ an ISO-31 66 code 

string t«rO^* £j as an ls0 - 6 39 code 

string targetLanguage, ta g 

string targetVanant; //targex v 

} 

al tstring «• £^£od. 210 IS " ln9 ° l " 

text entered by data entry Qf 

202 . Data entry methods alts „ ing 2 08, may 

basestring 204, sortString O^and alts ^ ^ 
depend at least in part on langu ^ ^ £ield 211 , 

aefined by source^ a. ^^^^ 218 . 
targetLocale field 216, ana t 

entry methods 210 are dependent on the 
Because data entry underlying host . 

locale and/or langauge employed by the u V 
system, creation of a new IStnng ob:ec ■ ■ P ^ 
results in the locale and language P^^ belng 

S ystem in which the "«^; < iTsourceLanguage 

pla ced in sourceLocaie ~e Igtring (or 

214 . A constructor for allocating ^ 

locale and language determined i 
a specified locale an » -Meet 202 is being 

system in which the IString class object 

created may be: 
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......»•••••»••"•""" 

* 

* <PX/P> 

* specified locale. </p> 

........»••••••— 

this.sortString = new String!); 

this.altString = new String! ; 

this sourceLocale = loc.getLocaleO, 

this sourceLanguage = loc.getLanguageO, 

initO; 
} 

• a n TString class 202 object is 
input of data into an IStrmg sourc e- 
n t lanauaqe- dependent. The souxo 

preferably locale- or langu S and 218 contro l 

La nguage and t^^ 1 ^^^. obje ct 202 by data 
how d ata is inpu ^J^J^ proper ty 2X4 .ay be 
input methods 210 The ^ on which the 

set to the language P ro P erty targe tLanguage 
IS tring class ^J^^'^^. or .ay 
property 218 may ^ * „ universal „ langua ge such 

^"^T Data input methods 210 compare sourceLanguage 

as English. Data mp determine what 

and targetLanguage properties 214 and 

and targ baseStri ng 204, sortString 206, and 

-is entered into basebtrmy 

"" " ■ TChvina class ob^ecc ^«2. 

altString 208 m an IStrmg cias 

entered into the baseString 204, 
Character ~ 4 £ields by d ata input 

: t a Jrro. either the user, direct entry or 
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specification, from transliteration engine 220, or from the 
input Method Editor (1MB) 224. Where the targetLanguage 
property 218 is set to English as a default, data entry 
methods 210 determine the contents of baseString 204, 
sortString 206, and altString 208 fields based upon the 
character set employed by the language in which data is 
entered by the user (sourceLanguage property 214) . 

For languages which employ the latin character set, the 
user input is placed by data entry methods 220 into all 
three fields (baseString 204, sortString 206, and altStrxng 
208) of the IString class 202 by data entry methods 210. 



* <PX/P> 



<A\~> <b > Description: </b> <dd> 

<p> Allocate a new IString which contains the same sequence of 
characters as the string argument in the spec.f.ed locale.</p> 

I...,........."""""""""""""""""*" 

public IStringlString str, Locale loc) { 
this.baseString = new String(str); 
this.sortString = new String(str); 
this.altString = new String(str); 
this.sourceLocale = !oc.getl_ocale{); 
this.sourceLanguage = loc.getLanguageO; 

initO; 

For most locales and languages, the entered string will be 
input into all three fields of the IString object 202^ If 
targetLanguage property 218 were not set to EngUsh data 
entry methods 224 would input the user-entered text into all 
three fields whenever the languages identified in source- 
Language and targetLanguage properties 214 and 216 employ a 
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apt (e a both employ latin characters, as 
common character set (e.g., o * 
in the case of Spanish and Afrikaans) . 

M II illustrates how data is entered into IString 
Table II iliuscr<*i.e , le utilize 

cl ass 202 fields where the host language and locale 

character set 



baseString 
sortString 
altString 
sourceLocale 



sourceLanguage_ 



Data 



Java String 
Java String 
Java String 
Java String 
Java String 




Hetherington 
Hetherington 
Hetherington^ 



US 



en 



Table II 

■ , fhe fields may be individually edited and the 
" de ; ir : Incia y P- Jed for sorting purposes by 

Hetherington") into sortstrmg 206. 

F or languages which ^ 

set, but which utilize a chara C t ^ 

capped to the latin cnar c te set ^ ^ sortstring 
by aata entry - .hods 1 mt ^ represenCatlon o£ the 

206, but a transude- inte rnal method within 

input is placed in altString 208^ An ^ ^ ^ 

the transliteration engine 220 is * repre sentation 

— string xzzizz - 

for altString 208 to trans. 
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other characters underlie t o people - - ^ ge 
familiar with the character set 

f a it-qtrinq 208, transliter- 
To generate the contents of resQurce file 

ation engine 220 selects - ^ ^ alcerM te text 

222 containing a nappr* * * o£ tne 

to b e placed ,n employe d based on the 

particular resource frle w Java resource 

cognation of ^/^^he co^inatlon of languages for 
files 222 are named for the the exampl e shown 

whl ch the r^PPin, -sian (CvrilUc 

in Figure 2, ™- en ; Cl * S * charac ters, . The structure of 

characters, - -g - ^ assoclatea entries £or 

ro:Z e ia^a" Characters and corresponding lat.n 



characters 



c aT1 Tqtrinq object in which 
altStrmg 208 is ua 



be: 
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********** 
* 

• <PX/P> 



.......„........—««•••••""" 



. passed "tig, the altString is transliterated into the target 
* language. </p> 



„„..•......*•»***•***••******••' 



public IStringlString str) { 

this.baseString = new String(str); 
this.sortString = new String(str); 
if(isSameLanguage(() 
this. altString = new String (str); 

else 

this.altString = transmognfylstr, 

this.sourceLanguage, 

this.targetLanguage); 

The .transmogrify" method is the internal method »" hin 
transliteration engine 220 which was described above^ The 
character set into which the entered characters are trans 
Uterated is determined from the targetLanguage 
218 , which in the exemplary embodiment is be 
to English. Given an appropriate resource file 222, 
however, characters may be transliterated between any two 
languages for which characters in one language sound-map to 
one or more characters in the other. 

Wbl. Ill illustrates how data is entered into IString 
class 202 by data entry methods 210 where the ™ e 
utilizes a non-latin character set which maps to the latrn 
character set, such as Russian Cyrillic. 
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Table III 

t entered by the user is 
tt the example shown . t* = tex sortstring 20 6. but 

inserted into both 208 ls seleo ted by 

th e text entered rnto altstr ng ^ ^ o£ 

transliteration engine "° U " sound map pings. The 

Ru ssia„ Cyrillic to Kngl ^"bleeString 204 ls thus 
tW t r r«- - a pronunciation *ey £ or users 

^r-rthe <u* - 

Fo r lances ^J^T^ ^ 
set 0 r a character se, ^ 210 in put data into 
latin character set, data altstring 208 fields 

che basestring 20* ^ edito r ,») «*• 

wh ich is - :Li,ed input method editor or 

IME 224 may be either a 4l , t aarated into Asian 

t „e input method editor which is * ^ 

versions of the «^°« ^ ^Walhington. « the Windows 
Microsoft ^^; at ;: n or ° f is R ;:; ioyed , the appropriate data 

internal data storage. 
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.„ „„ ates ho „ data ie entered into IString 
Table I» illustrates no logosy llabic 

class 202 by data entry -«»^ 0 ^ Lither the latin 

character set nor a cnaxa 

character — 



to the latin 



Java String 
Java String 
Java String 
Java String 
Java String 
Java String 
String 



<Kan ] i> 

h ayashi 

JP 

ja_ 
US 



en 



Field 

baseString 
sortString 
altString 
S ourceLocal« 
sourceLanguage 

raraetLocale 
h a-raetLanguage 

Table IV 

a not- have alphabets, but instead 
L o g osyllabic languages do no ( „ ideograp hs., 
have very lar g e character ther than simple 

corresponding to concepts and rat ^ ^ 

sounds. ,or instance, the ,oyc ■ « ' » m5 

U se, adopted for '-—^cannot contain enough 
sy *ols. Normal colter Key ^ ^ 

separate Keys t have °ne utUizing keystrok e 

so l*t » »^ riLacters from one of two phonetic 
combinations to select dictionary lookup for 

syll abaries. hiragana or*t > a - ^ ^ ^ ^ 

Kanj i symbol — - P m . dencifled above. 
Windows NT input method editor 

n.bic or ideograhic languages, therefore, 
For logosyllabic or J g ^ characte rs 

the data entered into ^ St "^ ed ^ograph. The 

typed by the user to compose the des 
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data entered into sortString 206 are the 
characters phonetically spelling the desired ideograph. 
Providing an intermediate representation of the ideograph 
P The aata entered into baseString 204 is the final ideograph 
selected by the user. As with transliteration of non-latin 
characters as described above, non-latin characters may be 
entered into altString 208 if the targetLanguage property is 
s" to a language other than English and IME 224 supports 
composition of the ideographs by phonetic spelling in a 
language other than English. For instance, an string 
object 202 might contain Japanese Kanji in baseString 204. 
hiragana in sortString 206. and ^^^^ 
altString 208 if IME 224 permits composition of Japanese 
Kanji characters by phonetic spelling in Russian. 

A suitable constructor for receiving baseString 204 

,„« *nH altString 208 from IME 224 via data entry 
sortString 206 and altString a ,_...,„, ma „ he- 

methods 210 for entry into an IString object 202 may be. 
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r ************************************************** 

* 

* <PX/P> 
* 

* <dt> <b>Description:</b> <dd> 

* <p> Allocate a new IString. The baseString, sortStnng and 

* altString are entered from the IME utilizing the default language and 

* locale. </p> 

************* *************************************** 

public IString(String base, 

String sort, 

String alt, 

Locale src, 

Locale tgt) { 
this.baseString = base; 
this.sortString = sort; 
this. altString = alt; 
this.sourceLocale = src.getLocaleO; 
this.sourceLanguage = src.getLanguageO; 
this.targetLocale = tgt.getLocaleO; 
this.targetLanguage = tgt.getLanguageO; 

initO; 
} 

The contents of baseString 204, sortString 206 and altString 
208 are entered into the respective fields from data derived 
from IME 224, while the contents of sourceLocale 212 and 
sourceLanguage 214 are entered from the default locale and 
language properties specified by the host system in whxch 
data is being entered into IString object 202. The contents 
of targetLocale 216 and targetLanguage 218 will typically be 
a locale/language code for a language utilizing the latin 
character set such as "en_US« (English - United States) . 

Regardless of the language in which text is entered 
into an IString class object 202, the data automatically 
entered into each of the baseString 204, altString 206, and 
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or altered using -"two* nd 7^ ly and inaepen a e „ tly 

obj ect 2 «;- n f a : a L tio „ within sortstring 

field 206 as aescn correction o£ 

.elected ideograph in baseStnng ° 
a phonetic spelling within altstring field 208. 

while the above-described methods assumed that the 
„! target languages were taken from host system 
source and target X » 9 * be entered into baseString 

defaults, data ™ 208 £or specified source 

204 sortString 206 and altstring 

and target languages utilising the constructor: 
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............ 

* 

* <PX/P> 

* language and locale. </p> 

...............«•»»•••• 

10 public IString{String base, 

String sort, 
String alt, 

String srcLanguage, 
String srcLocale, 
"hi String tgtLanguage, 

m String tgtLocale) { 

fU this.baseString = base; 

^ this.sortString = sort; 

Jf " this.altString = alt; 

^5 this.sourceLocale = srcLocale; 

H this.sourceLanguage = srcLanguage; 

fe this.targetLocale = tgtLocale; 

I* this.targetLanguage = tgtLanguage; 

M ' nit(); 

.^!d This latter constructor may be employed to 
30 Tafe In 1^. 2 oa In other than the h t 

IString object 202 is receive 
local instance is created. 

It should be noted that transliteration engine 220 and 
ihodB 226 need not necessarily be implemented 
messaging methods 226 neea n plmire 2 and 

within an IString class 202 as deputed xn Figure 2, 
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,W 224 need not be implemented separately. 

r 2 r — - - - - impiemenced as a 

method within IString class 202. 

■ „ ■)•>(! and IME 224 and are only 
Transliteration en 91 n >™£ inpuC data for 

retired by data entry - thodS "° 9 „ locale and la nguage 
IS tring class 202 be programmatically 

property settings. othe "' lse ' 206 , an a altstring 208 

input into basestring 204, Ms whi ch may 

by invoking the proper constructor , Uy geC 

be invoked by programs at runtrme to pr 9 



* <PX/P> 

<dt> <b> Description: </b> 
<p> Get the IString baseStr.ng.</p> 

I ©returns str String containing the base string 

— """"" 

public String getBaseStringO { 
return this.baseStnng; 

^s -hod returns the contents tor baseString ^ » 
i^ring object 202. Similar methods retur.. - 
sortstring 206 and altstring 20B: 
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******* 



*»***********< 



r *«************ 

y***»**^ 
* 

* <PX/P> 

* <dtXb>Description:</bXdd> 
<p>Getthe IString sortStnng.</p> 



* 

" ©returns str String containing the sort string 

***** 



:_************* ***********— 

public String getSortStringO { 
return this.sortStnng; 

} —************ 



************** 

/ 

* 



,************< 



* <PX/P> 



<dt> <b>Description:</b> <dd> 
<p> Get the IString altStnng.</p> 



Returns str String containing the alt string 

;**************— ****— 

public String getAltStringO { 
return this.altStnng; 

* 

* <PX/P> 



<dt> <b > Description: </b> <dd> 
<p> Set the IString baseStr.ng. </p> 

@ P aram str String containing the base string 
,******************* 



* 
* 
* 
* 
* 



************* J 



PU b,ic void setBaseString(String sBase) { 
this.baseString = sBase, 
} 
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f0h -i na 206 and altString 208: 

as we ".r:.::"!""!.» ..«.»»••»•» 

* 

* <PX/P> 

* 

* <dt><b> Description: </b><dd> 

* <p> Set the IString sortString.</P> 

* @param str String containing the sort string 

_*************♦*********** 



l° ************************** 

public void setSortString(String sSrt) { 
this.sortString = sSrt; 



„****************** 

***************** 



f ********^ 



1 ^^**************** 

* 

* <PX/P> 

* 

* < dt><b> Description: </b><dd> 

* <p> Set the IString altString. </p> 



* 

* « 



@param str String containing the alt string 



************< 
Jl public void setAltString(String sAlt) { 

|I this.altString = sAlt; 

S } 

30 eortString 206 , J^ 1 "^ ^ display local a or 

tta locale or language properties o£ IStrag 
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*************************************************** 
* 

* <PX/P> 
* 

* <dt><b> Description: </b><dd> 

* <p> Get the locale of the IString data.</p> 
* 

* ©returns loc Locale containing the locale of the data 

* 

*************************************************** 

public Locale getLocaleO { 

Locale loc = new Locale(this.sourceLanguage, this.sourceLocale); 

return loc; 

} 

/1t ************************************************** 

* 

* <PX/P> 

* 

* <dt> <b> Description: </b><dd> 

* <p> Set the locale of the IString data.</p> 
* 

* @param loc Locale of the data 
* 

*************************************************** 

public void setLocale(Locale loc) { 

this.sourceLocale = loc. getLocaleO; 
this.sourceLanguage = loc.getLanguageO; 

} 
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/* ************************************************** 
* 

* <PX/P> 

* 

* <dt><b> Description: </b><dd> 

* <p> Get the display language of the IString data. </p> 
* 

* ©returns Display language of the data 
* 

*************************************************** 

public String getDisplayLanguageO { 

Locale loc = new Locale(this.sourceLanguage, this.sourceLocale); 

return loc. getDisplayLanguageO; 
} 

I* ************************************************** 
* 

* <P> </P> 
* 

* <dt><b> Description: </b><dd> 

* <p> Get the display locale of the IString data.</p> 
* 

* ©returns Display locale of the data 
* 

*************************************************** 

public String getDisplayLocaleO { 
if(this.sourceLanguage = =null&&this.sourceLocale= =null) 

return null; 
else{ 

Locale loc = new Localefthis.sourceLanguage, this.sourceLocale); 
return loc. getDisplayLocaleO; 

} 

} 

While these methods are available, IString class 202 
preferably exhibits a "black box" behavior such that the 
programmer/user need not know anything about the methods 
implemented for IString class 202. IString class 202 simply 
appears as a data type which encapsulates extra information 
about basest ring 204 and also includes some methods for 
transforming characters from one character set to another. 
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For special cases where the sortString field 206 or 
altString field 208 are to be exposed to the user in 
addition to or in lieu of baseString 204, either for editing 
or for display only, a separate set of controls may be 
provided. 

in the present invention, IString class 202 is employed 
to effectively transfer human language data across systems 
employing incongruous languages. The contents of baseString 
204 provide a native representation of the text in the 
default language of the system originating the IString 
object 202. However, for each system participating in the 
exchange of data with other systems running in different 
human languages, the targetLocale property 216 and 
targetLanguage 218 property of an IString object 202 are 
preferably set to a common value (e.g., targetLocale="US» , 
targetLanguage=»en») . The contents of altString 208 will 
thus contain a common, cross -language representation of the 
text string. In systems where the default language of a 
system receiving an object differs from the language of the 
contents of baseString 204, IString class object 202 may 
automatically switch to presenting the contents of altString 
208 as the text string to be displayed or processed. 

Referring to Figure 3, a high level flowchart for a 
process of entering data into a multi-field text string 
class in accordance with a preferred embodiment of the 
present invention is depicted. Figure 3 is intended to be 
read in conjunction with Figure 2. The process shown in 
Figure 3 begins at step 302, which depicts initiation of 
data entry into a multi-field text string class (IString) 
object 202. The process then passes to step 304, which 
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illustrates a determination of whether the currently 
selected language, specified by the operating system or 
application environment language and/or locale properties, 
employs the latin alphabet character set. Languages 
supported by a given system may be categorized to facilitate 
this determination, and the category employing the latin 
alphabet should include English and the romance languages 
(Spanish, French, Italian, etc.). If the current language 
employs the latin alphabet character set, the process 
proceeds to step 306, which depicts inserting the text 
entered by the data entry keystrokes into all three fields- - 
basestring 204, sortString 206, and altString 208- -of 
IString object 202. Thus, in most locales and/or languages 
for a locale, data is input programmatically by invoking the 
appropriate constructor and the baseString text is inserted 
into the other two fields by default. 

Referring back to step 304, if the currently-selected 
language for data entry into the IString object does not 
utilize the latin alphabet character set, the process 
proceeds instead to step 308, which illustrates a determin- 
ation of whether the currently selected language maps to the 
latin alphabet character set. This category of languages 
will most likely include, for example, Cyrillic, Greek, 
Hebrew, and many Germanic and Arabic languages. If so, the 
process proceeds to step 310, which depicts storing the text 
of the data entry keystrokes into the baseString and 
sortString fields 204 and 206, and then to step 212, which 
illustrates storing the entered text sound-mapped to latin 
alphabet characters in the altString field 208. 
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It should be noted that there may be some overlap 
between the first an second categories of languages. or~ 
stated differently-some language-dependent variation s the 
manner in which language entry is handled. For example, for 
Spanish text, while most characters may be entered directly 
into all three fields of an IString class object, the n 
character may be sound mapped to "ny in the altstring field 
of an IString object to provide information regarding proper 
pronunciation. Alternatively, the altstring field may be 
filled with a traditional phonetic pronunciation guide to 
the data entered into the IString object (e.g.. "kum-er" or 
.. K oo-mer») to provide pronunciation information for words in 
languages employing the latin alphabet character set as well 
as for ideographs. 

Referring back to step 308, if the current language 
does not map readily to the latin alphabet character set 
,e g the language employs an ideographic character set) , 
the process proceeds instead to step 314, which depicts 
storing the data entry keystrokes in the altstring field 208 
as a pronunciation guide, then to step 316, which 
illustrates storing intermediate characters (such as 
hiragana or katakana characters) in the sortstring field 
20 6, and finally to step 318, which depicts storing the 
ideograph in the baseString field 204. Steps 314. 3 6 and 
318 illustrates the operation of the 1MB in storing data in 
an IString object. 

From any of steps 306, 312, or 318, the process 
proceeds to step 320, which illustrates the process becoming 
idle until data entry to an IString class object is again 
„ tiated. The distinct data entry behavior of the IString 
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lorale or language property allows 
cla ss based on the local o g automatlcally ^pped to 

language specie =~ pronunciation key. A 

recocmizable characters and saved as a p 

user My thus view the character-capped representation of an 
abstract o bj ect name to be able to recogn i„ , . spec, - 
object, despite a lack of familiarity with the character 
in which the object name text string was entered. 

Referring to Figures 4A and 4B, portions of a user 
Referring a character -mapped data 

interface showing « « * " ield cext string 

entry into ^e™te leids o ^ ent o£ the 

rlags in accordance with a pretexxc 

class m a . lliiqtr . ted The user interface 

r^:"-:™; rrr. r.,,». ... 

field as described above. 

m Figure 4A, the baseString field contents <" 

KyMrwp ) ^displayed. -"^^ ^ n^e 
character set would not be able to ^is name. 

alterinq the user interface so that the 
air S « Ig fi d nl ents ( .. D avid Kumhy,, is displayed as 
altstring f le recogniZ e the name o£ an 

shown in Figure 4B, the u automatically 
object which they wish to mam p»l.t.. 

transliterated characters saved in the ^ tS «^ as 
provide a recognizable representation of the text 
well as a pronunciation key. 

With reference now to Figure 5 and Figures 6 A through 
63 a h^h 1 vel flowchart and corresponding user interface 
sp ay for data entry in a logosyllabic language into a 
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multi-field text string class in accordance with a preferred 
embodiment of the present invention are depicted. Figures 5 
and 6A-6G are intended to be read in conjunction with each 
other and with Figure 2, and depict the process of IME 224 
in generating the contents of baseString 204, sortString 
206, and altString 208 for an ideographic (or logosyllabic) 
language as described in connection with steps 314, 316, and 
318 of Figure 3. 

For Japanese and similar logosyllabic languages, IME 
224 monitors the keystrokes entered, selecting appropriate 
hiragana (or katakana) characters, and finally presents a 
list of possible matching Kanji symbols. The process begxns 
at step 502, which illustrates data entry being initiated 
with the language selected not mapping to the latin alphabet 
character set. The process may thus be performed between 
steps 308 and 314 depicted in Figure 3. From step 502, the 
process first passes to step 504, which illustrates a 
determination of whether a character has been entered by the 
user. Any character entry should be in latin alphabet 
characters, even for logosyllabic languages. As noted 
earlier, data input for such languages is accomplished 
phonetically utilizing combinations of latin alphabet 
characters to select symbols from phonetic syllabaries, with 
a dictionary lookup for the final ideograph. 

If no character was entered, the process proceeds from 
step 504 back to step 504, to continue polling for character 
entry. If a character is entered, however, the process 
proceeds instead to step 506, which depicts adding the 
entered character into altString 208. The process then 
passes to step 508, which illustrates a determination of 
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whether the entered character, together with any previously 
entered characters which have not yet been mapped to a 
syllabary symbol, corresponds to a syllabary symbol. If 
not, the process returns to step 504 to await further 
character input. If so, however, the process proceeds 
instead to step 510, which depicts adding the syllabary 
symbol corresponding to the entered character (s) to 
altString field 206. 

From step 510, the process then passes to step 512, 
which illustrates a determination of whether the entered 
character, together with any previously entered characters 
which have not yet been mapped to an ideograph, corresponds 
to an ideograph. If not, the process returns to step 504 to 
await further character input. If so, however, the process 
proceeds instead to step 514, which depicts presenting the 
(potentially) matching ideograph or ideographs to the user 
for selection, then adding the selected ideograph to 
baseString 204. The process then returns to step 504 to 
await further input, and continues as described until 
interrupted by a control indicating the data entry is 
completed or terminated. Those skilled in the art will 
recognize that some mechanism should be provided for 
prevented termination of data entry when latin alphabet 
characters entered to phonetically compose an ideograph do 
not correspond to any available ideograph in the dictionary. 

A specific example of data entry utilizing IME 224 is 
illustrated for the data in Table III by Figures 6A through 
6G. In composing the word "hayashi," the user would first 
enter "h« . This is not a valid hiragana character, so IME 
224 will display the "h" as shown in Figure 6A, add the "h" 
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to the altString filed 208, and wait for the next character 
to be entered. The user next enters "a", forming the 
phoneme "ha", for which I ME 224 may temporarily display 
«ha», then select and display in lieu of "ha". Altern- 
atively, IME 224 may simply select and display "11" in lieu 
of "h" as shown in Figure 6B. IME 224 also adds the "a" to 
the altString field 208 and adds "tt« to sortString field 
206. 

Similarly, upon entry of "y" by the user, IME 224 
displays the "y" as shown in Figure 6C and adds that 
character to altString 208; upon entry of "a", IME 224 
selects and displays in place of the »y» as shown in 

Figure 6D, then adds "a" to altString 208 and "?» to 
sortString 206. Following user entry of »s», IME 224 adds 
the character to altString 208 and displays the character as 
shown in Figures 6E; when the user subsequently enters "h», 
IME 224 adds the character to altString 208 and displays the 
character as shown in Figures 6F. 

Finally, upon user entry of "i", which causes the 
entered text to correspond to a phonetic spelling of a Kanji 
character, IME 224 adds "i" to altString 208 and selects and 
adds "L" to sortString 206. As each latin character is 
entered by the user and added to altString 208, or altern- 
atively as each hiragana/katakana syllabary symbol is 
selected and added to sortString 206, IME 224 checks the 
lookup dictionary for possible Kanji symbols corresponding 
to the phonetic pronunciation entered. Upon determining a 
match, IME 224 presents a list to the user for selection as 
shown in Figure 6G. The user selection is subsequently 
entered into baseString 204. 
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Filtering input to IString object 202 through I ME 224, 
which may form a portion of the operating system, the 
application environment, or an editor, allows intermediate 
and alternative representations of a name or word to be 
captured along with the final form. This is particularly 
important for logosyllabic languages using ideographs, which 
are especially difficult to deal with for non-speakers of 
that language. Describing an ideograph over the telephone 
for assistance in determining meaning can be extremely 
challenging, a task further complicated where the same 
ideograph has multiple meanings or pronunciations. For 
instance, many Kanji symbols have multiple pronunciations 
and/or meanings. Therefore, merely seeing the characters 
does not provide enough information to know how to pronounce 
the name, which has resulted in the Japanese business card 
ritual of presented the card and pronouncing the name at the 
same time. Capturing intermediate representations within 
IString object 202 allows non- speakers to read and know how 
to pronounce or match ideographic characters. 

It is important to note that while the present 
invention has been described in the context of a fully 
functional data processing system and/or network, those 
skilled in the art will appreciate that the mechanism of the 
present invention is capable of being distributed in the 
form of a computer usable medium of instructions in a 
variety of forms, and that the present invention applies 
equally regardless of the particular type of signal bearing 
medium used to actually carry out the distribution. 
Examples of computer usable mediums include: nonvolatile, 
hard- coded type mediums such as read only memories (ROMs) or 
erasable, electrically programmable read only memories 
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(EEPROMs) , recordable type mediums such as floppy disks, 
hard disk drives and CD-ROMs, and transmission type mediums 
such as digital and analog communication links. 

While the invention has been particularly shown and 
described with reference to a preferred embodiment, it will 
be understood by those skilled in the art that various 
changes in form and detail may be made therein without 
departing from the spirit and scope of the invention. 
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